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Abstract 

In this paper, we study the algorithmic complexity of the Mastermind game, where results 
are single-color black pegs. This differs from the usual dual-color version of the game, but 
better corresponds to applications in genetics. We show that it is NP-complete to determine 
if a sequence of single-color Mastermind results have a satisfying vector. We also show how 
to devise efficient algorithms for discovering a hidden vector through single-color queries. 
Indeed, our algorithm improves a previous method of Chvatal by almost a factor of 2. 

1 Introduction 

Mastermind [2,4] is a game played between two players — a codemaker and a codebreaker — using 
colored pegs. (See Figure [T]) 

Viewed mathematically, Mastermind is abstracted as a game where the codemaker selects a 
plaintext vector, V, of length N, whose elements are selected from an alphabet of size K. For 
consistency with the board game, the members of this alphabet are often referred to as "colors." 
The codemaker and codebreaker both know the values of N and K, and game play consists of the 
codebreaker repeatedly making guesses, Vx, V2, ■ ■ ., about the identity of V. For each guess Vi, the 
codemaker provides a score on how well Vi matches V. In double-count Mastermind, which is the 
standard version based on the board game, this score consists of a pair of two numbers: 
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Figure 1: The Mastermind game. The four large pegs in the middle are used for guessing. The 
four smaller peg locations on the left are used to score each guess — with black-peg and white-peg 
scores. (Image, Copyright 2009, Michael T. Goodrich. Used with permission.) 

• A black count, b(Vi), which is the number of elements in Vi and V that match in both value 
and location. That is, 

KV i ) = \{j:V l [j) = V[j]}\. 

• A white count, w(Vi), which is the number of elements in Vi that appear in V but in different 
locations than their locations in That is, letting ir denote an arbitrary permutation, 

w(Vi) = max Vfrij)] = V[j}}\ - b(V). 

In single-count Mastermind, which has been less studied, the codebreaker is given only the black- 
peg count, b(Vj), for each guess, V^. (Note that it is impossible to solve the problem given only 
white-count scores.) The goal is for the codebreaker to discover V using a small a number of 
guesses. 

1.1 Previous Related Work 

The original Mastermind game was invented in 1970 by Meirowitz, as a board game having holes 
for sequences of length N = A and K = 6 colored pegs. Knuth [4] subsequently showed that 
this instance of the Mastermind game can be solved in five guesses or less. Chvatal [2] studied 
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the combinatorics of general Mastermind, showing that it can be solved in polynomial time, in the 
K > N case, using 2iV|~log K] + AN guesses, and Chen et al. [1] showed how this bound can be 
improved, in this case, to 2N [log N] +2iV+ \K/N~\ +2 guesses. Stuckman and Zhang [6] showed 
that it is NP-complete to determine if a sequence of guesses and responses in general double-count 
Mastermind is satisfiable. 

1.2 Our Results 

In this paper we study the single-count (black-peg) version of Mastermind. Such a scenario is 
motivated from genomic data, where a genomic database owner, Dave, can "play" a type of Mas- 
termind game with a genomic query string Q-fox which a querier thinks that he is querying Dave 
in a privacy-preserving manner — but instead Dave is discovering the full identity of Q. That is, 
Q is iteratively compared with strings provided by Dave (assumed to be from his database, D), 
with each done in a privacy-preserving online manner, so that all is learned from each comparison 
is the score measuring the similarity of the two strings, with the (black-peg) score for each string 
comparison being revealed to the database owner (and possibly also the owner of Q) before the 
next comparison begins. 

We begin our discussion by showing that, in fact, the problem of determining whether a se- 
quence of Mastermind responses has a valid solution is NP-complete, even if each response is a 
single-count response. In addition to the NP-completeness result, we show that an arbitrary query 
string, Q, of length N from an alphabet of size K, can be discovered with N\\ogK~\ + [(2 — 
\/K)N~\ + K guesses, each of which is a single-count response. This improves the Chvatal upper 
bound by almost a factor of 2. 

2 Black-Peg Mastermind is NP- Complete 

As mentioned above, Stuckman and Zhang [6] show that double-count Mastermind satisfiability 
is NP-complete. Unfortunately, their proof, which is based on a reduction from the well-known 
Vertex Cover problem, does not translate into a proof that single-count Mastermind satisfiability 
is NP-complete. So we provide such a proof in this section. The implications of this fact are 
that satisfying an arbitrary sequence of Mastermind queries should be considered computationally 
infeasible. 

In the single-count Mastermind satisfiability problem, we are given a sequence of Mastermind 
queries, V-y, V 2 , . . . , Vn, and the responses, b(Vi), 6(V 2 ), . . . , 6(Vjv), each of which is said to report 
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the number of indices such that the characters in a V* and an unknown vector, V, at these locations 
match. We are asked to determine if there indeed exists a vector V that satisfies all of these 
responses. 

Theorem 1: Single-count Mastermind satisfiability is NP-complete. 

Proof: It is easy to see that single-count Mastermind satisfiability is in NP. For example, we could 
nondeterministically guess a vector V and then test in polynomial time whether it satisfies all the 
responses, 6(Vi), b(V 2 ), . . . , &(V/v). 

To prove that single-count Mastermind satisfiability is NP-hard, we provide a reduction from 
3-Dimensional Matching (3DM), which is a well-known NP-complete problem (e.g., see [3]). In 
the 3DM problem, we are given three sets, 

X = {xi, . . . , x n }, Y = {y u . . . , y n }, and Z = {z 1: . . . , z n }, 

of n elements each. In addition, we are given a set T of m triples, { , y^ , z kl ) , . . . , (x im , y jm , z km ) } , 
whose elements are respectively taken from the three sets, X, Y, and Z. The problem is to deter- 
mine if there is a subset of triples such that each element in X, Y, and Z appears in exactly one 
triple in this subset. 

Suppose, then, that we are given an instance of the 3DM problem, as described above. We 
consider the unknown vector, V, to consist of the following sequence of variables: 

(Xi, . . . , X n ; Y\_, . . . , Y n ; Z±, . . . , Z n \ Ti, . . . , T m ), 

where the semi-colons are used for the sake of notation to separate the four sections in the unknown 
vector, V. We perform our reduction by constructing a sequence of guess vectors, V 1 , V 2 , . . . , V N , 
together with their responses, 6(V"i), b(V 2 ), . . . , b(V N ), so that there is a satisfying vector V for 
these responses if and only if there is a solution to the given instance of the 3DM problem. Our 
construction begins by setting the number of colors, K, to be m + 1. Intuitively, there is a color 
associated with each triple in T, plus a "null" color, 0, which is guaranteed not to appear in our 
unknown vector, V. We begin our sequence of queries with three special "enforcer" queries: 

Vi = (0, . . . , 0; 0, . . . , 0; 0, . . . , 0; 0, . . . , 0), 

which has response 6(Vi) = 0, 

V 2 = (0, . . . , 0; 0, . . . , 0; 0, . . . , 0; 1, 1, . . . , 1), 
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which has response b(V 2 ) = n, and 

V 3 = (0, • • • , 0; 0, • • • , 0; 0, • • • , 0; 0, 0, . . . , 0), 

which has response b(Vs) = m — n. Intuitively, V\ enforces the fact that the null color, 0, cannot 
appear in the unknown vector, V 2 enforces a counting rule that exactly n of the Tj's will be set to 
1, and V 3 enforces a counting rule that the remaining m — n of the Tj's will be set to 0. For each 
triple, T s = (x is , y Js , z ks ), we construct three query vectors, as follows. 

V a> i = (0, . . . , 0, s, 0, . . . , 0; 0, . . . , 0; 0, . . . , 0; 0, . . . , 0, 0, 0, . . . , 0), 

where the s is in position i s in the first group and the is in position s in the fourth group. This 
vector has response, b(V S:1 ) = 1. Next, we construct 

V s ,2 = (0, • • • , 0; 0, • • • , 0, s, 0, . . . , 0; 0, . . . , 0; 0, . . . , 0, 0, 0, . . . , 0), 

where the s is in position j s in the second group and the is in position s in the fourth group. This 
vector has response, b(V Sj 2) = 1. Finally, we construct 

\/ Si3 = (0, . . . , 0; 0, . . . , 0; 0, . . . , 0, S, 0, . . . , 0; 0, . . . , 0, 0, 0, . . . , 0), 

where the s is in position k s in the third group and the is in position s in the fourth group. This 
vector has response, b(V Si3 ) = 1. Intuitively, these three responses collectively form a "chooser" 
gadget, where we will either have T s = or the three variables X is , Yj a , and Z ks , will each be set 
to have color s (and T s = 1). 

This reduction can clearly be done in polynomial time. So all that remains is for us to show 
that it works. Suppose, then, that there is a possible solution to the given instance of 3DM. Then 
for each chosen triple, T s = (x is , y Js , z ks ), we can assign colors T s = 1, X is = s, Yj a = s, and 
Zk s — s > which will satisfy each of the V^i, V Sj2 , and vector responses for this value of s. 
Likewise, setting T s = will satisfy each of the V Sjl , V 8j2 , and V s ^ vector responses for a triple 
T s that is not chosen. Finally, given that there are n chosen vectors, we will satisfy the three 
preliminary vector responses as well. 

Suppose, alternatively, that we have a vector V that satisfies all of our vector responses. We 
know that each X i9 Yj, and Z k must be assigned a color other than 0. Since there are only m + 1 
colors, this implies each X if Yj, and Z k must be assigned a color corresponding to a triple number, 
s. If this T s = 1, then in order to have satisfied the vectors V^i, V Sj2 , and V^ ;3 , we must have 
set X is = s, Yj s = s, and Z ks = s, which implies we can include the triple (X is , Yj s Zk a ) in our 
matching. If T s = 0, then we do not include this triple in our matching. By the vector responses V 2 
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and V 3 , we know that the number of triples chosen in this way is exactly n. Thus, we have found a 
valid 3 -dimensional matching. ■ 



Thus, it is extremely unlikely that we will be able to find a polynomial-time algorithm that 
can solve arbitrary Mastermind query sequences, even if they are single-count results. Note that 
this is not the same as a guarantee that discovering a string Q requires a long query sequence, 
however. For we show, in the section that follows, that such query strings, Q, can be discovered 
fairly efficiently using a single-count Mastermind algorithm. 

3 A Mastermind Algorithm for Single- Count Match Queries 

In this section, we explore an algorithm for the single-count Mastermind game, where the code- 
breaker, Dave, engages in a series of guesses against the unknown string, Q, each of which reveals 
only the single-count score between the query string Q and strings provided by Dave, in an it- 
erative online fashion. Here, we show that Dave can learn Q with a sequence of N\\ogK] + 
[(2 — \/K)N~\ + K guesses, where N is the length of Q and K is the size of the alphabet (whose 
members we call "colors"). 

We begin the algorithm for Dave by having him perform K queries, each of which is a vector of 
elements that are all the same color. This allows us to initially know the cardinality, ci, c 2 , . . . , ck, 
of every color in the unknown vector, Q. If any q = 0, then we remove the color % from our 
alphabet of colors, and update the value of K accordingly. The remainder of Dave's computation 
proceeds as a recursive divide-and-conquer algorithm, which is similar in structure to the approach 
of Chvatal [2], but improves his bound by almost a factor of 2, even though his algorithm was for 
the general two-color case, by reusing knowledge gained in previous reclusive calls. 

The generic problem is to determine the values of all the elements in a range Q[l..r], which 
initially is the entire vector Q = Q[0..N — 1], assuming we know the values of c 1: c 2 , . . . , c K , of 
every color in Q[l..r], and each Cj > 0. If K < 1, we are done; so let us assume without loss of 
generality that K > 2. In addition, we assume inductively that we know d, the number of instances 
of color 1 outside of the range Q[l..r\. Initially, of course, d — 0. 

Given this initial setup, we split Q[Z..r] into Q[l..m] and Q[m+l..r], where m is in the middle of 
the interval [Z,r]. The main challenge, then, is to provide for Q[l..m] and Q[m+l..r] the same setup 
we had for Q[l..r\. This setup can be accomplished by determining the cardinalities, x\, x 2 ,...,xk 
and yi, y 2 , . . . , yx, of every color that respectively appears in Q[l..m] and Q[m + l..r]. We do this 
with a series of K — 1 additional queries, where we guess that the elements in Q[l..m] are of color 
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i, for i = 2, 3, . . . , K, and that the rest of Q is of color 1. Let the values of these queries be denoted 
as 6 2 , &3, • • • , and note that, at this point, we know the following: 

Xi + yi = Ci, fori = 1,2, ... ,K (1) 
Xi + yi = bi — d, for i = 2, 3, . . . , K (2) 
Xi + X2 + ■ ■ ■ + xk — Tn — I + 1. (3) 

Thus, we can determine yi, as 

_ ci + Sf =2 (6j - d) - (m - Z + 1) 

for?/! is counted K times in the sum of c x and all the (b— d)'s, and the sum of the Xj's is m— Z+l, by 
Equation ([3]). Given the value of y±, we can then determine all the %i values, by using Equation ([T]) 
for x\ and Equation ([2]) for a; 2 , X3, . . . , Moreover, once we have all these X{ values, we can 
determine the values, ?/2,2/3, • • • ,Vk, using Equation ([T]). Finally, we can determine the values 
d! — d + 2/1 and c?" = d Xl and use these respectively for the role of d in Q[l..m} and Q[m + l..r]. 
This gives us all the values necessary to then recursively determine Q[l..m] and Q[m + l..r]. 

Let us, therefore, analyze the number, G(N, K), of vector guesses performed by this algorithm. 
Ignoring for the time being the initial set of K guesses, we can bound this parameter using the 
following recurrence: 

G(N, K) = 2G{N/2, K) + min{A^, K - 1}. 
Thus, adding the initial K queries back in, we get that the total number of guesses is at most 

iV [log K~\ + \(2 - 1/K)N] + K. 
Therefore, we have the following. 

Theorem 2: Given an unknown length-N string Q, defined on an alphabet of size K, a Master- 
mind algorithm can discover Q in polynomial time using N [log K~\ + [(2 — l/iQiV] + K tests 
against Q, each of which reveals only the number of positions where Q and the test string match. 

4 Conclusion 

We have shown that, even though the single-count and sequence-alignment Mastermind satisfiabil- 
ity problems are NP-complete, one can effectively construct single-count Mastermind algorithms 
on arbitrary character strings just by knowing basic information about the length of the strings and 
the number of characters in the alphabet used to construct those strings. 
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